Bayesian Estimation of Disclosure Risks for Multiply Imputed, Synthetic Data

نویسندگان

  • Jerome Reiter
  • Quanli Wang
  • Biyuan E. Zhang
چکیده

Many national statistical agencies, survey and research organizations, and businesses— henceforth all called agencies—collect data that they intend to share with others. These agencies strive to release data that (i) protect the confidentiality of data subjects’ identities and sensitive attributes, (ii) are informative for a wide range of analyses, and (iii) are relatively straightforward for secondary data analysts to use. Most strategies for meeting these three criteria involve altering data values, such as suppressing values, aggregating variables, swapping values across records [10], and adding random noise to data values [20]. An alternative to releasing datasets is to release perturbed results of user-specified queries [16]; we consider only dataset releases here.

منابع مشابه

Signi cance tests for multi-component estimands from multiply imputed, synthetic microdata

To limit the risks of disclosures when releasing data to the public, it has been suggested that statistical agencies release multiply imputed, synthetic microdata. For example, the released microdata can be fully synthetic, comprising random samples of units from the sampling frame with simulated values of variables. Or, the released microdata can be partially synthetic, comprising the units or...

متن کامل

Distribution-Preserving Statistical Disclosure Limitation1

One approach to limiting disclosure risk in public-use microdata is to release multiply-imputed, partially synthetic data sets. These are data on actual respondents, but with con…dential data replaced by multiply-imputed synthetic values. A mis-speci…ed imputation model can invalidate inferences based on the partially synthetic data, because the imputation model determines the distribution of s...

متن کامل

Distribution-preserving statistical disclosure limitation

One approach to limiting disclosure risk in public-use microdata is to release multiply-imputed, partially synthetic data sets. These are data on actual respondents, but with con…dential data replaced by multiply-imputed synthetic values. A mis-speci…ed imputation model can invalidate inferences because the distribution of synthetic data is completely determined by the model used to generate th...

متن کامل

Likelihood Based Finite Sample Inference for Singly Imputed Synthetic Data Under the Multivariate Normal and Multiple Linear Regression Models

In this paper we develop likelihood-based finite sample inference based on singly imputed partially synthetic data, when the original data follow either a multivariate normal or a multiple linear regression model. We assume that the synthetic data are generated by using the plug-in sampling method, where unknown parameters in the data model are set equal to observed values of their point estima...

متن کامل

Synthetic Datasets for the German IAB Establishment Panel

Disseminating microdata to the public that provide a high level of data utility while at the same time guaranteeing the confidentiality of the survey respondent is a difficult task. Generating multiply imputed synthetic datasets is an innovative statistical disclosure limitation technique with the potential of enabling the data disseminating agency to achieve this twofold goal. So far, the appr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

متن کامل
عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014